The area under the ROC curve as a measure of clustering quality

نویسندگان

چکیده

The Area Under the Receiver Operating Characteristics (ROC) Curve, referred to as AUC, is a well-known performance measure in supervised learning domain. Due its compelling features, it has been employed number of studies evaluate and compare different classifiers. In this work, we explore AUC unsupervised domain, more specifically, context cluster analysis. particular, elaborate on use an internal/relative clustering quality, which refer Curve for Clustering (AUCC). We show that AUCC given candidate solution expected value under null model random solutions, regardless size dataset and, importantly, or (im)balance clusters evaluation. addition, fact that, validation consider, actually linear transformation Gamma criterion from Baker Hubert (1975), also formally derive theoretical chance clusterings. discuss computational complexity these criteria while ordinary implementation can be computationally prohibitive impractical most real applications analysis, equivalence with unveils much efficient algorithmic procedure. Our findings are supported by experimental results. These results addition effective robust quantitative evaluation provided AUCC, visual inspection ROC curves themselves useful further assess broader, qualitative perspective well.

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

The Area under the ROC Curve as a Criterion for Clustering Evaluation

In the literature, there are several criteria for validation of a clustering partition. Those criteria can be external or internal, depending on whether we use prior information about the true class labels or only the data itself. All these criteria assume a fixed number of clusters k and measure the performance of a clustering algorithm for that k. Instead, we propose a measure that provides t...

متن کامل

Boosting the Area under the ROC Curve

We show that any weak ranker that can achieve an area under the ROC curve slightly better than 1/2 (which can be achieved by random guessing) can be efficiently boosted to achieve an area under the ROC curve arbitrarily close to 1. We further show that this boosting can be performed even in the presence of independent misclassification noise, given access to a noise-tolerant weak ranker.

متن کامل

Area under the curve as a measure of discounting.

We describe a novel approach to the measurement of discounting based on calculating the area under the empirical discounting function. This approach avoids some of the problems associated with measures based on estimates of the parameters of theoretical discounting functions. The area measure may be easily calculated for both individual and group data collected using any of a variety of current...

متن کامل

Estimation of the area under the ROC curve.

The area under the receiver operating characteristic curve is frequently used as a measure for the effectiveness of diagnostic markers. In this paper we discuss and compare estimation procedures for this area. These are based on (i) the Mann-Whitney statistic; (ii) kernel smoothing; (iii) normal assumptions; (iv) empirical transformations to normality. These are compared in terms of bias and ro...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Data Mining and Knowledge Discovery

سال: 2022

ISSN: ['1573-756X', '1384-5810']

DOI: https://doi.org/10.1007/s10618-022-00829-0